Liquid Clustering config for table materialization #398

ammarchalifah · 2023-07-28T11:44:19Z

Resolves #399

Description

With this change, dbt user could supply liquid_clustered_by model config with column (or list of columns) to be used as clustering keys using Liquid Clustering feature of Delta. The change introduces a small change in DDL query, while it does nothing for python-based models (as I couldn't find the python API for Liquid Clustering).

Checklist

I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

Signed-off-by: Ammar Chalifah <[email protected]>

ammarchalifah · 2023-07-28T16:02:13Z

Tested on private Databricks cluster, works as intended. Resulting table is clustered correctly.

ammarchalifah · 2023-08-05T14:29:57Z

Hi, could you help review this PR? @andrefurlan-db @susodapop @ueshin

susodapop

Change looks good with one question 👍 Thanks for this contribution.

susodapop · 2023-08-09T15:41:57Z

tests/unit/macros/test_adapters_macros.py

@@ -154,6 +176,7 @@ def test_macros_create_table_as_all_delta(self):
            "create or replace table my_table "
            "using delta "
            "partitioned by (partition_1,partition_2) "
+            "cluster by (cluster_1,cluster_2)"


From the docs (emphasis mine):

Clustering is not compatible with partitioning or ZORDER, and requires that the Azure Databricks client manages all layout and optimization operations for data in your table.

Therefore the output of this self._render_create_as(), while expected, would technically raise an exception. Should we enforce any kind of block or just allow the compute cluster to raise an exception if the dbt user configures both partition_by and liquid_clustered_by?

That's a good question. I'm for leaving the compute cluster to raise an exception, but would be open for suggestion

I'm pinging a couple other engineers internally for an opinion on this.

My stance has been to let Databricks surface the issues, rather than intercepting in the adapter. I'm not sure if we're doing this consistently everywhere. My reasoning is that, for any rule we hard code in, the Databricks implementation could change, and it would be better for that functionality to be available to customers without us having to spin a new release.

benc-db · 2023-08-09T17:00:35Z

@ammarchalifah can you file an issue for us to add this capability for python models as well? The python model support is currently in flux, so no need for you to make the change, but I would appreciate if you capture the expected behavior (matching your sql tests) in an issue.

ammarchalifah · 2023-08-09T17:42:38Z

@ammarchalifah can you file an issue for us to add this capability for python models as well? The python model support is currently in flux, so no need for you to make the change, but I would appreciate if you capture the expected behavior (matching your sql tests) in an issue.

Yes, opened a new issue here.

susodapop

Thanks again! I've updated this to merge to a staging branch on the main repository. This enables the staging branch to run our e2e tests. Once those pass, we'll merge to main

ammarchalifah-bolt · 2023-08-09T18:41:38Z

Thanks again! I've updated this to merge to a staging branch on the main repository. This enables the staging branch to run our e2e tests. Once those pass, we'll merge to main

@susodapop Thank you! Really happy to contribute and can't wait to try out Liquid Clustering in our system!

ammarchalifah · 2023-08-10T10:06:42Z

All tests passed. Should we merge the PR? @susodapop

ammarchalifah · 2023-08-14T08:32:53Z

Any update with this PR? @susodapop

Signed-off-by: Ammar Chalifah <[email protected]> Co-authored-by: Ammar Chalifah <[email protected]>

Signed-off-by: Ammar Chalifah <[email protected]> Co-authored-by: Ammar Chalifah <[email protected]> (cherry picked from commit b632484)

ammarchalifah requested review from andrefurlan-db, susodapop and ueshin as code owners July 28, 2023 11:44

Ammar Chalifah and others added 5 commits July 28, 2023 18:16

Initial commit, adding liquid_clustered_by in adapter macros

65ca3b7

Signed-off-by: Ammar Chalifah <[email protected]>

Add unit test

e2f4b39

Signed-off-by: Ammar Chalifah <[email protected]>

Exclude logging for liquid clustering in python language

e7bb541

Signed-off-by: Ammar Chalifah <[email protected]>

Lint

98efe02

Signed-off-by: Ammar Chalifah <[email protected]>

Lint black

d2d43f9

Signed-off-by: Ammar Chalifah <[email protected]>

ammarchalifah-bolt force-pushed the feature-liquid-clustering branch from 288eaa9 to d2d43f9 Compare July 28, 2023 15:27

Add changelog

f78c151

Signed-off-by: Ammar Chalifah <[email protected]>

ammarchalifah changed the title ~~Feature liquid clustering~~ Liquid Clustering config for table materialization Jul 28, 2023

susodapop self-assigned this Aug 9, 2023

susodapop reviewed Aug 9, 2023

View reviewed changes

ammarchalifah mentioned this pull request Aug 9, 2023

Liquid Clustering configuration for Python-based models #411

Closed

Merge branch 'main' into feature-liquid-clustering

95ea065

susodapop changed the base branch from main to staging-liquid-clustering August 9, 2023 18:21

susodapop approved these changes Aug 9, 2023

View reviewed changes

rcypher-databricks approved these changes Aug 9, 2023

View reviewed changes

susodapop merged commit 154e5c6 into databricks:staging-liquid-clustering Aug 14, 2023
18 checks passed

susodapop mentioned this pull request Aug 14, 2023

Liquid Clustering config for table materialization (#398) #415

Merged

3 tasks

susodapop pushed a commit that referenced this pull request Aug 15, 2023

Liquid Clustering config for table materialization (#398) (#415)

b632484

Signed-off-by: Ammar Chalifah <[email protected]> Co-authored-by: Ammar Chalifah <[email protected]>

susodapop pushed a commit that referenced this pull request Aug 29, 2023

Liquid Clustering config for table materialization (#398) (#415)

45bb6fd

Signed-off-by: Ammar Chalifah <[email protected]> Co-authored-by: Ammar Chalifah <[email protected]> (cherry picked from commit b632484)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Liquid Clustering config for table materialization #398

Liquid Clustering config for table materialization #398

ammarchalifah commented Jul 28, 2023 •

edited

Loading

ammarchalifah commented Jul 28, 2023

ammarchalifah commented Aug 5, 2023

susodapop left a comment

susodapop Aug 9, 2023

ammarchalifah Aug 9, 2023

susodapop Aug 9, 2023

benc-db Aug 9, 2023

benc-db commented Aug 9, 2023

ammarchalifah commented Aug 9, 2023

susodapop left a comment

ammarchalifah-bolt commented Aug 9, 2023

ammarchalifah commented Aug 10, 2023

ammarchalifah commented Aug 14, 2023

Liquid Clustering config for table materialization #398

Liquid Clustering config for table materialization #398

Conversation

ammarchalifah commented Jul 28, 2023 • edited Loading

Description

Checklist

ammarchalifah commented Jul 28, 2023

ammarchalifah commented Aug 5, 2023

susodapop left a comment

Choose a reason for hiding this comment

susodapop Aug 9, 2023

Choose a reason for hiding this comment

ammarchalifah Aug 9, 2023

Choose a reason for hiding this comment

susodapop Aug 9, 2023

Choose a reason for hiding this comment

benc-db Aug 9, 2023

Choose a reason for hiding this comment

benc-db commented Aug 9, 2023

ammarchalifah commented Aug 9, 2023

susodapop left a comment

Choose a reason for hiding this comment

ammarchalifah-bolt commented Aug 9, 2023

ammarchalifah commented Aug 10, 2023

ammarchalifah commented Aug 14, 2023

ammarchalifah commented Jul 28, 2023 •

edited

Loading